Overview

Dataset statistics

Number of variables14
Number of observations14958
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.4 MiB
Average record size in memory168.0 B

Variable types

NUM12
CAT1
DATE1

Reproduction

Analysis started2021-05-09 08:11:59.729883
Analysis finished2021-05-09 08:12:33.222763
Duration33.49 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
bluecars_returned_sum is highly correlated with bluecars_taken_sum and 4 other fieldsHigh correlation
bluecars_taken_sum is highly correlated with bluecars_returned_sum and 4 other fieldsHigh correlation
utilib_returned_sum is highly correlated with utilib_taken_sumHigh correlation
utilib_taken_sum is highly correlated with utilib_returned_sumHigh correlation
utilib_14_taken_sum is highly correlated with bluecars_taken_sum and 2 other fieldsHigh correlation
utilib_14_returned_sum is highly correlated with bluecars_taken_sum and 2 other fieldsHigh correlation
slots_freed_sum is highly correlated with bluecars_taken_sum and 2 other fieldsHigh correlation
slots_taken_sum is highly correlated with bluecars_taken_sum and 2 other fieldsHigh correlation
df_index is uniformly distributed Uniform
df_index has unique values Unique
dayofweek has 2374 (15.9%) zeros Zeros
utilib_taken_sum has 4972 (33.2%) zeros Zeros
utilib_returned_sum has 4909 (32.8%) zeros Zeros
utilib_14_taken_sum has 2605 (17.4%) zeros Zeros
utilib_14_returned_sum has 2568 (17.2%) zeros Zeros
slots_freed_sum has 9492 (63.5%) zeros Zeros
slots_taken_sum has 9499 (63.5%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count14958
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8046.0734723893565
Minimum0
Maximum16083
Zeros1
Zeros (%)< 0.1%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile804.7
Q14030.25
median8050.5
Q312063.75
95-th percentile15280.15
Maximum16083
Range16083
Interquartile range (IQR)8033.5

Descriptive statistics

Standard deviation4642.145244
Coefficient of variation (CV)0.5769454206
Kurtosis-1.199174841
Mean8046.073472
Median Absolute Deviation (MAD)4017
Skewness-0.00150783443
Sum120353167
Variance21549512.46
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
6822 1 < 0.1%
 
15042 1 < 0.1%
 
8897 1 < 0.1%
 
10944 1 < 0.1%
 
4791 1 < 0.1%
 
6838 1 < 0.1%
 
693 1 < 0.1%
 
2740 1 < 0.1%
 
12979 1 < 0.1%
 
Other values (14948) 14948 99.9%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
1 1 < 0.1%
 
2 1 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
ValueCountFrequency (%) 
16083 1 < 0.1%
 
16082 1 < 0.1%
 
16081 1 < 0.1%
 
16080 1 < 0.1%
 
16079 1 < 0.1%
 

postal_code
Real number (ℝ≥0)

Distinct count104
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88800.6020189865
Minimum75001
Maximum95880
Zeros0
Zeros (%)0.0%
Memory size117.0 KiB

Quantile statistics

Minimum75001
5-th percentile75006
Q191330
median92340
Q393400
95-th percentile94500
Maximum95880
Range20879
Interquartile range (IQR)2070

Descriptive statistics

Standard deviation7641.51636
Coefficient of variation (CV)0.08605252877
Kurtosis-0.5342936433
Mean88800.60202
Median Absolute Deviation (MAD)1030
Skewness-1.172012811
Sum1328279405
Variance58392772.28
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
94130 145 1.0%
 
92190 145 1.0%
 
94300 145 1.0%
 
94340 145 1.0%
 
94500 145 1.0%
 
78140 145 1.0%
 
94700 145 1.0%
 
95100 145 1.0%
 
75006 145 1.0%
 
75014 145 1.0%
 
Other values (94) 13508 90.3%
 
ValueCountFrequency (%) 
75001 145 1.0%
 
75002 145 1.0%
 
75003 145 1.0%
 
75004 145 1.0%
 
75005 145 1.0%
 
ValueCountFrequency (%) 
95880 145 1.0%
 
95870 145 1.0%
 
95100 145 1.0%
 
94800 145 1.0%
 
94700 145 1.0%
 

date
Date

Distinct count145
Unique (%)1.0%
Missing0
Missing (%)0.0%
Memory size117.0 KiB
Minimum2018-01-01 00:00:00
Maximum2018-06-18 00:00:00
Histogram

daily_data_points
Real number (ℝ≥0)

Distinct count12
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1438.563043187592
Minimum1411
Maximum1440
Zeros0
Zeros (%)0.0%
Memory size117.0 KiB

Quantile statistics

Minimum1411
5-th percentile1434
Q11439
median1440
Q31440
95-th percentile1440
Maximum1440
Range29
Interquartile range (IQR)1

Descriptive statistics

Standard deviation4.378831957
Coefficient of variation (CV)0.003043892986
Kurtosis19.7081417
Mean1438.563043
Median Absolute Deviation (MAD)0
Skewness-4.385874416
Sum21518026
Variance19.1741693
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1440 10109 67.6%
 
1439 2578 17.2%
 
1438 721 4.8%
 
1437 411 2.7%
 
1434 207 1.4%
 
1425 207 1.4%
 
1417 206 1.4%
 
1429 104 0.7%
 
1436 104 0.7%
 
1435 104 0.7%
 
Other values (2) 207 1.4%
 
ValueCountFrequency (%) 
1411 104 0.7%
 
1417 206 1.4%
 
1420 103 0.7%
 
1425 207 1.4%
 
1429 104 0.7%
 
ValueCountFrequency (%) 
1440 10109 67.6%
 
1439 2578 17.2%
 
1438 721 4.8%
 
1437 411 2.7%
 
1436 104 0.7%
 

dayofweek
Real number (ℝ≥0)

ZEROS
Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.9381601818424925
Minimum0
Maximum6
Zeros2374
Zeros (%)15.9%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.042206884
Coefficient of variation (CV)0.6950631543
Kurtosis-1.304166968
Mean2.938160182
Median Absolute Deviation (MAD)2
Skewness0.03519677094
Sum43949
Variance4.170608957
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 2374 15.9%
 
1 2269 15.2%
 
6 2169 14.5%
 
4 2168 14.5%
 
2 2062 13.8%
 
5 2061 13.8%
 
3 1855 12.4%
 
ValueCountFrequency (%) 
0 2374 15.9%
 
1 2269 15.2%
 
2 2062 13.8%
 
3 1855 12.4%
 
4 2168 14.5%
 
ValueCountFrequency (%) 
6 2169 14.5%
 
5 2061 13.8%
 
4 2168 14.5%
 
3 1855 12.4%
 
2 2062 13.8%
 

day_type
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size117.0 KiB
weekday
10728
weekend
4230
ValueCountFrequency (%) 
weekday 10728 71.7%
 
weekend 4230 28.3%
 

Length

Max length7
Mean length7
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 7 100.0%
 
ValueCountFrequency (%) 
Latin 7 100.0%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

bluecars_taken_sum
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count921
Unique (%)6.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean127.36889958550609
Minimum0
Maximum1255
Zeros42
Zeros (%)0.3%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile6
Q120
median47
Q3138
95-th percentile529
Maximum1255
Range1255
Interquartile range (IQR)118

Descriptive statistics

Standard deviation185.2157701
Coefficient of variation (CV)1.454167938
Kurtosis5.681815338
Mean127.3688996
Median Absolute Deviation (MAD)35
Skewness2.346092262
Sum1905184
Variance34304.88149
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12 235 1.6%
 
11 232 1.6%
 
14 231 1.5%
 
9 225 1.5%
 
10 221 1.5%
 
13 218 1.5%
 
16 195 1.3%
 
7 194 1.3%
 
15 194 1.3%
 
20 193 1.3%
 
Other values (911) 12820 85.7%
 
ValueCountFrequency (%) 
0 42 0.3%
 
1 95 0.6%
 
2 118 0.8%
 
3 155 1.0%
 
4 146 1.0%
 
ValueCountFrequency (%) 
1255 1 < 0.1%
 
1248 1 < 0.1%
 
1209 2 < 0.1%
 
1186 1 < 0.1%
 
1164 1 < 0.1%
 

bluecars_returned_sum
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count912
Unique (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean127.3550608370103
Minimum0
Maximum1271
Zeros17
Zeros (%)0.1%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile6
Q121
median47
Q3137
95-th percentile534
Maximum1271
Range1271
Interquartile range (IQR)116

Descriptive statistics

Standard deviation185.4304604
Coefficient of variation (CV)1.456011714
Kurtosis5.756299107
Mean127.3550608
Median Absolute Deviation (MAD)34
Skewness2.357224532
Sum1904977
Variance34384.45566
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12 236 1.6%
 
13 236 1.6%
 
17 226 1.5%
 
11 222 1.5%
 
10 222 1.5%
 
9 220 1.5%
 
14 213 1.4%
 
18 198 1.3%
 
15 198 1.3%
 
22 194 1.3%
 
Other values (902) 12793 85.5%
 
ValueCountFrequency (%) 
0 17 0.1%
 
1 90 0.6%
 
2 117 0.8%
 
3 142 0.9%
 
4 141 0.9%
 
ValueCountFrequency (%) 
1271 1 < 0.1%
 
1230 1 < 0.1%
 
1214 1 < 0.1%
 
1211 1 < 0.1%
 
1210 1 < 0.1%
 

utilib_taken_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count46
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.7297098542585907
Minimum0
Maximum47
Zeros4972
Zeros (%)33.2%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile17
Maximum47
Range47
Interquartile range (IQR)4

Descriptive statistics

Standard deviation5.789643488
Coefficient of variation (CV)1.55230399
Kurtosis7.098757084
Mean3.729709854
Median Absolute Deviation (MAD)1
Skewness2.488247194
Sum55789
Variance33.51997171
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 4972 33.2%
 
1 2750 18.4%
 
2 1664 11.1%
 
3 1111 7.4%
 
4 745 5.0%
 
5 565 3.8%
 
6 432 2.9%
 
7 315 2.1%
 
8 302 2.0%
 
9 240 1.6%
 
Other values (36) 1862 12.4%
 
ValueCountFrequency (%) 
0 4972 33.2%
 
1 2750 18.4%
 
2 1664 11.1%
 
3 1111 7.4%
 
4 745 5.0%
 
ValueCountFrequency (%) 
47 1 < 0.1%
 
46 1 < 0.1%
 
45 1 < 0.1%
 
43 1 < 0.1%
 
42 1 < 0.1%
 

utilib_returned_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count47
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.7323840085572937
Minimum0
Maximum47
Zeros4909
Zeros (%)32.8%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile17
Maximum47
Range47
Interquartile range (IQR)4

Descriptive statistics

Standard deviation5.797753201
Coefficient of variation (CV)1.553364602
Kurtosis7.204123791
Mean3.732384009
Median Absolute Deviation (MAD)1
Skewness2.501348695
Sum55829
Variance33.61394218
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 4909 32.8%
 
1 2798 18.7%
 
2 1727 11.5%
 
3 1060 7.1%
 
4 782 5.2%
 
5 537 3.6%
 
6 409 2.7%
 
7 348 2.3%
 
8 297 2.0%
 
9 226 1.5%
 
Other values (37) 1865 12.5%
 
ValueCountFrequency (%) 
0 4909 32.8%
 
1 2798 18.7%
 
2 1727 11.5%
 
3 1060 7.1%
 
4 782 5.2%
 
ValueCountFrequency (%) 
47 1 < 0.1%
 
45 1 < 0.1%
 
44 1 < 0.1%
 
43 2 < 0.1%
 
42 1 < 0.1%
 

utilib_14_taken_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count91
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.6936087712261
Minimum0
Maximum100
Zeros2605
Zeros (%)17.4%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median4
Q310
95-th percentile37
Maximum100
Range100
Interquartile range (IQR)9

Descriptive statistics

Standard deviation12.85855496
Coefficient of variation (CV)1.479081391
Kurtosis6.859179587
Mean8.693608771
Median Absolute Deviation (MAD)3
Skewness2.467497906
Sum130039
Variance165.3424356
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 2605 17.4%
 
1 2022 13.5%
 
2 1599 10.7%
 
3 1242 8.3%
 
4 987 6.6%
 
5 742 5.0%
 
6 634 4.2%
 
7 477 3.2%
 
8 392 2.6%
 
9 333 2.2%
 
Other values (81) 3925 26.2%
 
ValueCountFrequency (%) 
0 2605 17.4%
 
1 2022 13.5%
 
2 1599 10.7%
 
3 1242 8.3%
 
4 987 6.6%
 
ValueCountFrequency (%) 
100 1 < 0.1%
 
94 1 < 0.1%
 
93 1 < 0.1%
 
91 1 < 0.1%
 
90 1 < 0.1%
 

utilib_14_returned_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count92
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.69247225564915
Minimum0
Maximum96
Zeros2568
Zeros (%)17.2%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q310
95-th percentile37
Maximum96
Range96
Interquartile range (IQR)9

Descriptive statistics

Standard deviation12.85770887
Coefficient of variation (CV)1.479177442
Kurtosis6.850342393
Mean8.692472256
Median Absolute Deviation (MAD)3
Skewness2.467541735
Sum130022
Variance165.3206774
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 2568 17.2%
 
1 2041 13.6%
 
2 1629 10.9%
 
3 1293 8.6%
 
4 933 6.2%
 
5 770 5.1%
 
6 573 3.8%
 
7 479 3.2%
 
8 383 2.6%
 
9 368 2.5%
 
Other values (82) 3921 26.2%
 
ValueCountFrequency (%) 
0 2568 17.2%
 
1 2041 13.6%
 
2 1629 10.9%
 
3 1293 8.6%
 
4 933 6.2%
 
ValueCountFrequency (%) 
96 1 < 0.1%
 
94 2 < 0.1%
 
93 1 < 0.1%
 
90 1 < 0.1%
 
89 4 < 0.1%
 

slots_freed_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count289
Unique (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.865222623345367
Minimum0
Maximum344
Zeros9492
Zeros (%)63.5%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q35
95-th percentile150
Maximum344
Range344
Interquartile range (IQR)5

Descriptive statistics

Standard deviation52.25793121
Coefficient of variation (CV)2.285476598
Kurtosis6.06424788
Mean22.86522262
Median Absolute Deviation (MAD)0
Skewness2.548047338
Sum342018
Variance2730.891375
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 9492 63.5%
 
1 495 3.3%
 
2 453 3.0%
 
3 365 2.4%
 
4 311 2.1%
 
5 224 1.5%
 
6 169 1.1%
 
7 112 0.7%
 
8 89 0.6%
 
9 74 0.5%
 
Other values (279) 3174 21.2%
 
ValueCountFrequency (%) 
0 9492 63.5%
 
1 495 3.3%
 
2 453 3.0%
 
3 365 2.4%
 
4 311 2.1%
 
ValueCountFrequency (%) 
344 1 < 0.1%
 
334 1 < 0.1%
 
330 1 < 0.1%
 
322 1 < 0.1%
 
319 3 < 0.1%
 

slots_taken_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count292
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.86923385479342
Minimum0
Maximum349
Zeros9499
Zeros (%)63.5%
Memory size117.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q35
95-th percentile151
Maximum349
Range349
Interquartile range (IQR)5

Descriptive statistics

Standard deviation52.29121544
Coefficient of variation (CV)2.286531144
Kurtosis6.071606887
Mean22.86923385
Median Absolute Deviation (MAD)0
Skewness2.549361753
Sum342078
Variance2734.371212
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 9499 63.5%
 
1 494 3.3%
 
2 450 3.0%
 
3 374 2.5%
 
4 294 2.0%
 
5 223 1.5%
 
6 164 1.1%
 
7 131 0.9%
 
8 91 0.6%
 
9 66 0.4%
 
Other values (282) 3172 21.2%
 
ValueCountFrequency (%) 
0 9499 63.5%
 
1 494 3.3%
 
2 450 3.0%
 
3 374 2.5%
 
4 294 2.0%
 
ValueCountFrequency (%) 
349 1 < 0.1%
 
330 1 < 0.1%
 
328 1 < 0.1%
 
326 1 < 0.1%
 
322 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexpostal_codedatedaily_data_pointsdayofweekday_typebluecars_taken_sumbluecars_returned_sumutilib_taken_sumutilib_returned_sumutilib_14_taken_sumutilib_14_returned_sumslots_freed_sumslots_taken_sum
00750012018-01-0114400weekday110103321092220
11750012018-01-0214381weekday989411882322
22750012018-01-0314392weekday13813900222727
34750012018-01-0514404weekday11411733661820
45750012018-01-0614375weekend18718566783835
56750012018-01-0714406weekend180180221093434
67750012018-01-0814380weekday84833310101415
78750012018-01-0914391weekday818411441515
89750012018-01-1014402weekday88855511112322
910750012018-01-1114403weekday1251253413132222

Last rows

df_indexpostal_codedatedaily_data_pointsdayofweekday_typebluecars_taken_sumbluecars_returned_sumutilib_taken_sumutilib_returned_sumutilib_14_taken_sumutilib_14_returned_sumslots_freed_sumslots_taken_sum
1494816074958802018-06-0914405weekend1515001200
1494916075958802018-06-1014406weekend3432001000
1495016076958802018-06-1114400weekday1718000000
1495116077958802018-06-1214391weekday2525000000
1495216078958802018-06-1314402weekday1213001100
1495316079958802018-06-1414393weekday1513000000
1495416080958802018-06-1514404weekday1510002300
1495516081958802018-06-1614405weekend1919002100
1495616082958802018-06-1714406weekend3335110000
1495716083958802018-06-1814400weekday1114352200